Physiological Factors of Obstructive Sleep Apnoea Severity

A Multiple Regression Analysis of Arousal Index and Associated Risk Factors

Author

Anthony R.

Published

May 21, 2025

1 Abstract

This study investigates key physiological risk factors associated with the severity of obstructive sleep apnoea (OSA). Using multiple linear regression, we analyze the relationship between the arousal index (ai), a measure of sleep disruption, and several predictors: age, body mass index (BMI), neck size, and systolic blood pressure (SBP). The results reveal that neck size, SBP, and age are statistically significant predictors of OSA severity, while BMI shows minimal contribution after accounting for multicollinearity with neck size. Diagnostic plots confirm model assumptions of linearity and normality, though the model explains only about 22% of the variability in ai. This suggests that other unobserved variables, such as lifestyle or genetic factors, may also influence OSA severity.

2 Introduction

Obstructive Sleep Apnoea (OSA) is a common sleep disorder characterized by repetitive pauses in breathing due to upper airway obstruction. These interruptions lead to sleep fragmentation, hypoxia, and increased cardiovascular risk. Understanding physiological predictors of OSA severity is essential for developing effective screening and prevention strategies.

In this study, the arousal index (ai); representing the frequency of awakenings per hour, is used as the dependent variable. Four predictors are considered based on their known associations with OSA:

Body Mass Index (BMI) – a general indicator of body fat.

Neck Size – reflects airway obstruction potential.

Systolic Blood Pressure (SBP) – captures cardiovascular strain.

Age – accounts for physiological changes increasing OSA risk.

My goal here is to determine which of these variables significantly predict ai and to assess the overall performance and appropriateness of the regression model.

3 Exploratory Analysis

3.1 Scatter-plot Matrix

Relationships between the response and predictors

  • ai appears to have a moderate positive correlation with sbp, neck_size, and age.

  • bmi and neck_size appear to be highly correlated.

Among the predictors, bmi and neck size are highly correlated, which suggests that multicollinearity could be a concern in regression modeling. This correlation indicates that both variables are related to body composition.


3.2 Model Fitting and Interpretation

Fitting the linear regression model using ai as the response variable and the other variables as the predictor.


Call:
lm(formula = ai ~ bmi + neck_size + sbp + age, data = sleep)

Coefficients:
(Intercept)          bmi    neck_size          sbp          age  
  -0.159406    -0.009852     0.040627     0.010218     0.008789  

Significant predictors (p < 0.05) included neck size, sbp, and age, whereas bmi was not statistically significant.


3.2.1 Producing a 95% CI that quantifies the change in ai for each extra cm of neck size:

Confidence Interval at \(\alpha\) = 0.05 or 95%

               2.5 %     97.5 %
neck_size 0.01248892 0.06876571

Therefore, for every 1 cm increase in neck size, the arousal index (ai) increases by between 0.012 and 0.068 on the log scale, with 95% confidence, assuming age and sbp are held constant.

This provides strong evidence that neck size is an important risk factor for obstructive sleep apnoea (OSA) severity, as measured by the frequency of arousal during sleep.


3.2.2 F-test For The Overall Regression


Call:
lm(formula = ai ~ bmi + neck_size + sbp + age, data = sleep)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.67136 -0.32269  0.01491  0.35778  1.47595 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) -0.159406   0.518207  -0.308  0.75893   
bmi         -0.009852   0.011312  -0.871  0.38557   
neck_size    0.040627   0.014208   2.859  0.00503 **
sbp          0.010218   0.003555   2.875  0.00481 **
age          0.008789   0.002964   2.965  0.00367 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5417 on 117 degrees of freedom
Multiple R-squared:  0.2471,    Adjusted R-squared:  0.2213 
F-statistic: 9.598 on 4 and 117 DF,  p-value: 9.54e-07

neck_size, sbp and age have a significant impact on the response variable, arousal index (ai).


3.2.3 Fitting in the Multiple Regression Model

\[ ai_i = \beta_0 + \beta_1 \cdot \text{bmi}_i + \beta_2 \cdot \text{neck\_size}_i + \beta_3 \cdot \text{sbp}_i + \beta_4 \cdot \text{age}_i + \varepsilon_i \]

Model Parameters:

  • \(ai_i\) = arousal index for the i-th individual (log scale)

  • \(\beta_0\) = Intercept

  • \(\beta_1\) = bmi

  • \(\beta_2\) = neck_size

  • \(\beta_3\) = sbp

  • \(\beta_4\) = age

  • \(\varepsilon_i\) = random error

This model aims to assess the overall relationship between the arousal index and the set of physiological predictors: Body Mass Index (bmi), neck size, Systolic Blood Pressure (sbp), and age.


3.2.4 Hypothesis for the Overall ANOVA Test:

The NULL Hypothesis (\(H_0\)) states that none of the predictors have an effect on the response variable that is arousal index (ai).

  • \(H_0: \beta_1 = \beta_2 = \beta_3 = \beta_4 = 0\)

The Alternate Hypothesis states (\(H_1\)) that at least one of the predictors have an effect on the arousal index (ai).

  • \(H_1: \beta_i \ne 0\)

3.2.5 ANOVA Table for the Overall Model

Analysis of Variance Table

Response: ai
           Df Sum Sq Mean Sq F value    Pr(>F)    
bmi         1  1.725  1.7250  5.8777 0.0168638 *  
neck_size   1  3.288  3.2881 11.2040 0.0010982 ** 
sbp         1  3.674  3.6739 12.5185 0.0005789 ***
age         1  2.580  2.5798  8.7904 0.0036707 ** 
Residuals 117 34.337  0.2935                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     value      numdf      dendf 
  9.597645   4.000000 117.000000 

F-test for this Regression Model: F = (4,117) and \(p-value < 0.005\)

Since the p-value is well below 0.05, we reject the null hypothesis. This indicates that at least one of the predictors is significantly related to the response variable.


3.2.6 Null Distribution for the test statistic

Under the null hypothesis (\(H_0\)), the F-statistic follows an F-distribution with 4 and 117 degrees of freedom.

When \(H_0: \beta_1 = \beta_2 = \beta_3 = \beta_4 = 0\), then test statistic is F ~ F(4,117).


3.2.7 P-value

The corresponding P-value for this overall regression model is:

\[\text{p-value} = 9.54 \times 10^{-7}\]

(or \(0.000000954\))

P-value was previously calculated aforementioned above.


3.2.8 Findings

Statistical Conclusion:
As the p-value from the overall F-test is extremely small (\(9.54 \times 10^{-7}\)), that is below the significance level (\(\alpha = 0.05\)), we reject the null hypothesis that none of the predictors (bmi, neck_size, sbp, age) are related to the response variable.

Contextual Conclusion:
There is strong evidence that at least one of the predictors is significantly associated with the arousal index (ai). Thus, this overall regression model as a whole provides quite meaningful explanatory power for predicting ai.

In plain terms, the model isn’t random; there’s a clear relationship between OSA severity and the selected body measurements.


3.3 Model Validation & Appropriation

Checking residual vs fitted plots for linearity and constant variance

The residuals are generally evenly scattered around the horizontal line at 0, which supports the assumption of linearity. While there is no distinct shape or pattern there is a slight upward curve on the right.

Additionally, there are a few outliers present, at points 68, 69, and 79; however, they do not demonstrate excessive influence.

Checking the normality of residuals

The standardized qqplot shows that the residuals mostly follow the normal line, with very slight deviations at the tails. This implies that the normality assumptions are met, with no major concerns about non-normality.

Conclusion: Based on these diagnostic checks, the full multiple regression model appears appropriate for explaining the variability in the arousal index (ai).


3.4 \(R^2\) and It’s Significance

[1] 0.221317

Adjusted \(R^2\) = 0.2213. This is quite a low value, it means that in this model only about 22% of the variability is caused by the predictor variables, the other 78% is due to other factors not included or complete randomness.
This is not a good model, unreliable, and needs work.


4 Improving the Model

4.1 Checking for predictor significance


Call:
lm(formula = ai ~ bmi + neck_size + sbp + age, data = sleep)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.67136 -0.32269  0.01491  0.35778  1.47595 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) -0.159406   0.518207  -0.308  0.75893   
bmi         -0.009852   0.011312  -0.871  0.38557   
neck_size    0.040627   0.014208   2.859  0.00503 **
sbp          0.010218   0.003555   2.875  0.00481 **
age          0.008789   0.002964   2.965  0.00367 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5417 on 117 degrees of freedom
Multiple R-squared:  0.2471,    Adjusted R-squared:  0.2213 
F-statistic: 9.598 on 4 and 117 DF,  p-value: 9.54e-07

bmi doesn’t appear significantly impactful on linear model for ai and had also displayed strong correlation with neck_size prevoiusly aforementioned above. Therefore, I’ll be removing it and re-evaluate necessary assumptions.


Call:
lm(formula = ai ~ age + neck_size + sbp, data = sleep)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.65415 -0.35334  0.04008  0.37534  1.45627 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) -0.066166   0.506509  -0.131  0.89629   
age          0.009007   0.002951   3.053  0.00280 **
neck_size    0.032630   0.010831   3.013  0.00317 **
sbp          0.009579   0.003475   2.757  0.00676 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5412 on 118 degrees of freedom
Multiple R-squared:  0.2422,    Adjusted R-squared:  0.2229 
F-statistic: 12.57 on 3 and 118 DF,  p-value: 3.452e-07

Very small increase in the adjusted \(R^2\), from 0.221 to 0.222. P-value has dropped and is also still <0.05, implying that new model fit has not worsened. All predictors are significant.

QQnorm is quite linear, no significant deviations.

Residuals are quite evenly scattered around. No trend or patterns visible.

Residual vs fitted is also quite normal, no signs of a trend but a slight upwards deviation on the right.

QQplot is relatively linear.

It would be safe to say that the linearity assumptions are met.

Therefore, final model:

\[ ai_i = \beta_0 + \beta_1 \cdot \text{age} + \beta_2 \cdot \text{sbp} + \beta_3 \cdot \text{neck\_size} + \varepsilon \]


4.1.1 Comparing Model Fits

Model Adjusted R² Predictors Comment
Original 0.2471 0.2213 BMI, Neck Size, SBP, Age BMI not significant
Refined 0.2422 0.2229 Neck Size, SBP, Age Slight improvement

The \(R^2\) dropped marginally (from 0.2470586 to 0.2421771) because we removed a predictor. The adjusted \(R^2\) always accounts for the no.of predictors in the model, and only increases if new variables improve the model’s efficiency, the increase (from 0.221317 to 0.2229104) in adjusted \(R^2\) indicates that the new model is likely more accurate . bmi may not have contributed significantly to explaining ai in the older model, so its removal results in a more efficient model.


5 Discussion

The findings confirm that neck size, blood pressure, and age are important factors associated with OSA severity. BMI, although commonly linked to sleep apnoea, was not significant here once neck size was included, suggesting that neck circumference captures the effect of body mass more directly for airway obstruction.

Still, since the model only explains 22% of ai variation, other factors such as genetic traits, lifestyle habits, or anatomical structures likely play major roles.


6 Conclusion

This study shows that neck size, systolic blood pressure, and age are key predictors of OSA severity, measured by the arousal index.

Even though the model explains only a modest portion of the variability, it provides valuable insight into which physical traits are most strongly linked to disrupted sleep. Improving future models with additional lifestyle and physiological data could lead to better prediction and prevention strategies for OSA.